Search results

1 – 2 of 2
Article
Publication date: 2 May 2023

Giovanna Aracri, Antonietta Folino and Stefano Silvestri

The purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction…

Abstract

Purpose

The purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.

Design/methodology/approach

A method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.

Findings

The study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.

Originality/value

The paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.

Details

Journal of Documentation, vol. 79 no. 6
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 14 December 2021

Claudia Lanza, Antonietta Folino, Erika Pasceri and Anna Perri

The aim of this study is a semantic comparative analysis between the current pandemic and the Spanish flu. It is based on a bilingual terminological perspective oriented to…

Abstract

Purpose

The aim of this study is a semantic comparative analysis between the current pandemic and the Spanish flu. It is based on a bilingual terminological perspective oriented to evaluate and compare the terms used to describe and communicate the pandemic's issues both to biomedical experts and to a non-specialist public.

Design/methodology/approach

The analysis carried out is a terminological comparative investigation performed on two corpora, the first containing scientific English articles, the second Italian national newspapers' issues on two pandemics, the Spanish flu and the current Covid-19 disease, towards the detection of semantic similarities and differences among them through the implementation of computational tasks and corpus linguistics methodologies.

Findings

Given the cross-fielding representativeness of terms, and their relevance within specific historical eras, our study is conducted both on a synchronic and on a diachronic level to discover the common lexical usages in the dissemination of the pandemic issues.

Originality/value

The study presents the extraction of the main representative terms about two pandemics and their usages to share news about their trends among the population and the integration of a topic modeling detection procedure to discover some of the main categories representing the lexicon of the pandemics with reference to a list of classes created by external thesauri and ontologies on pandemics. As a result, a detailed overview of the discrepancies, as well as similarities, retrieved in two historical corpora dealing with a common subject, i.e. the pandemics' terminology, is provided.

Details

Journal of Documentation, vol. 78 no. 4
Type: Research Article
ISSN: 0022-0418

Keywords

1 – 2 of 2